Auto-tuning for Energy Usage in Scientific Applications
نویسندگان
چکیده
The power wall has become a dominant impeding factor in the realm of exascale system design. It is therefore important to understand how to most effectively create application software in order to minimize its power usage while maintaining satisfactory levels of performance. In this work, we use existing software and hardware facilities in order to tune applications to minimize for several combinations of power and performance. The tuning is done with respect to software level performance-related tunables (cache tiling factors and loop unrolling factors) as well as for processor clock frequency. These tunable parameters are explored via an offline search in order to find the parameter combinations that are optimal with respect to performance (or delay, D), energy (E), energy×delay (E ×D) and energy×delay×delay (E × D). These searches are employed on a parallel application that solves Poisson’s equation using stencil computations. Stencil (nearestneighbor) computations are very common operations in today’s scientific applications. We show that the parameter configuration that minimizes energy consumption can save, on average, 5.4% energy with a performance loss of 4% when compared to the configuration that minimizes runtime. Furthermore, with the work presented in this paper, we provide evidence for the existence of opportunities to auto-tune for energy in parallel applications.
منابع مشابه
Resource-Efficient, Hierarchical Auto-Tuning of a Hybrid Lattice Boltzmann Computation on the Cray XT4
We apply auto-tuning to a hybrid MPI-pthreads lattice Boltzmann computation running on the Cray XT4 at National Energy Research Scientific Computing Center (NERSC). Previous work showed that multicorespecific auto-tuning can improve the performance of lattice Boltzmann magnetohydrodynamics (LBMHD) by a factor of 4× when running on dualand quad-core Opteron dual-socket SMPs. We extend these stud...
متن کاملMulti-Objective Auto-Tuning with Insieme: Optimization and Trade-Off Analysis for Time, Energy and Resource Usage
The increasing complexity of modern multiand many-core hardware design makes performance tuning of parallel applications a difficult task. In the past, auto-tuners have been successfully applied to minimize execution time. However, besides execution time, additional optimization goals have recently arisen, such as energy consumption or computing costs. Therefore, more sophisticated methods capa...
متن کاملOffline Auto-Tuning of a PID Controller Using Extended Classifier System (XCS) Algorithm
Proportional + Integral + Derivative (PID) controllers are widely used in engineering applications such that more than half of the industrial controllers are PID controllers. There are many methods for tuning the PID parameters in the literature. In this paper an intelligent technique based on eXtended Classifier System (XCS) is presented to tune the PID controller parameters. The PID controlle...
متن کاملAuto-tuning Parallel Programs at Compiler- and Application-Levels
Auto-tuning has recently received its fair share of attention from the High Performance Computing community. Most auto-tuning approaches are specialized to work either on specific domains dense/sparse linear algebra, stencil computations etc.; or only at certain stages of program execution compile-time, launch-time or run-time. Real scientific applications, however, demand a cohesive environmen...
متن کاملAtune-IL: An Instrumentation Language for Auto-tuning Parallel Applications
Automatic performance tuning (auto-tuning) has been used in parallel numerical applications for adapting performance-relevant parameters. We extend auto-tuning to general-purpose parallel applications on multicores. This paper concentrates on Atune-IL, an instrumentation language for specifying a wide range of tunable parameters for a generic auto-tuner. Tunable parameters include the number of...
متن کامل